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Abstract — Polar codes are introduced for discrete memoryless 
broadcast ctiannels. For m-user deterministic broadcast channels, 
polarization is applied to map uniformly random message bits 
from rn independent messages to one codeword while satisfying 
broadcast constraints. The polarization-based codes achieve rates 
on the boundary of the private-message capacity region. For 
two-user noisy broadcast channels, polar implementations are 
presented for two information-theoretic schemes: i) Cover's 
superposition codes; ii) Marton's codes. Due to the structure 
of polarization, constraints on the auxiliary and channel-input 
distributions are identified to ensure proper alignment of polar- 
ization indices in the multi-user setting. The codes achieve rates 
on the capacity boundary of a few classes of broadcast channels 
(e.g., binary-input stochastically degraded). The complexity of 
encoding and decoding is 0{nlogn) where n is the block length. 
In addition, polar code sequences obtain a stretched-exponential 
decay of 0(2"" ) of the average block error probability where 
< ^9 < i. 

Index Terms — Polar Codes, Deterministic Broadcast Channel, 
Cover's Superposition Codes, Marton's Codes. 



I. Introduction 

1TRODUCED by T. M. Cover in 1972, the broadcast prob- 
lem consists of a single source transmitting m independent 
private messages to m receivers through a single discrete, 
memoryless, broadcast channel (DM-BC) (V\. The private- 
message capacity region is known if the channel structure 
is deterministic, degraded, less-noisy, or more-capable 
For general classes of DM-BCs, there exist inner bounds 
such as Marton's inner bound [3] and outer bounds such as 
the Nair-El-Gamal outer bound ||4]. One difficult aspect of 
the broadcast problem is to design an encoder which maps 
m independent messages to a single codeword of symbols 
which are transmitted simultaneously to all receivers. Several 
codes relying on random binning, superposition, and Marton 's 
strategy have been analyzed in the literature (see e.g., the 
overview in IS). 

This work was presented in part at the International Zurich Seminar on 
Communications, Zurich, Switzerland on March 1, 2012, and submitted in 
part to the IEEE International Symposium on Information Theory on January, 
2013. 

tN. Goela and M. C. Gastpar are with the Department of Electrical Engi- 
neeiing and Computer Science, University of California, Berkeley, Berkeley, 
CA 94720-1770 USA (e-mail: {ngoela, gastpar} @eecs. berkeley.edu) and also 
with the School of Computer and Communication Sciences, Ecole Poly- 
technique Federale (EPFL), Lausanne, Switzerland (e-mail: {naveen. goela, 
michael. gastpar} @epfl.ch). 

" E. Abbe was with the School of Computer and Communication Sciences, 
Ecole Polytechnique Federale (EPFL), Lausanne, Switzerland, and is currently 
with the School of Engineeiing and Applied Sciences, Princeton University, 
Princeton, NJ, 08544 USA (e-mail: {eabbe@princeton.edu}). 



A. Overview of Contributions 

The present paper focuses on low-complexity codes for 
broadcast channels based on polarization methods. Polar codes 
were invented originally by Ankan and were shown to achieve 
the capacity of binary-input, symmetric, point-to-point chan- 
nels with 0{n log n) encoding and decoding complexity where 
n is the code length 16|. In this paper, we obtain the following 
results. 

• Polar codes for deterministic, linear and non-linear, 
binary-output, m-user DM-BCs (cf. [7]). The capacity- 
achieving broadcast codes implement low-complexity 
random binning, and are related to polar codes for 
other multi-user scenarios such as Slepian-Wolf dis- 
tributed source coding [8J, L9J, and multiple-access chan- 
nel (MAC) coding ||TOl. For deterministic DM-BCs, the 
polar transform is applied to channel output variables. 
Polarization is useful for shaping uniformly random 
message bits from m independent messages into non- 
equiprobable codeword symbols in the presence of hard 
broadcast constraints. As discussed in Section II-B 1 1 and 
referenced in ifTTI - lfTSl . it is difficult to design low- 
complexity parity-check (LDPC) codes or belief prop- 
agation algorithms for the deterministic DM-BC due to 
multi-user broadcast constraints. 

• Polar codes for general two-user DM-BCs based on 
Cover's superposition coding strategy. In the multi-user 
setting, constraints on the auxiliary and channel-input dis- 
tributions are placed to ensure alignment of polarization 
indices. The achievable rates lie on the boundary of the 
capacity region for certain classes of DM-BCs such as 
binary-input stochastically degraded channels. 

• Polar codes for general two-user DM-BCs based on 
Marton 's coding strategy. In the multi-user setting, due to 
the structure of polarization, constraints on the auxiliary 
and channel-input distributions are identified to ensure 
alignment of polarization indices. The achievable rates lie 
on the boundary of the capacity region for certain classes 
of DM-BCs such as binary-input semi-deterministic chan- 
nels. 

• For the above broadcast polar codes, the asymptotic decay 
of the average error probability under successive cancela- 
tion decoding at the broadcast receivers is established to 
be (9(2-"'") where < ;3 < i. The error probabiHty 
is analyzed by averaging over polar code ensembles. 
In addition, properties such as the chain rule of the 
Kullback-Leibler divergence between discrete probability 
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measures are exploited. 

Throughout the paper, for different broadcast coding strate- 
gies, a systems-level block diagram of the communication 
channel and polar transforms is provided. 



B. Relation to Prior Work 

1) Deterministic Broadcast Channels: The deterministic 
broadcast channel has received considerable attention in the 
literature (e.g. due to related extensions such as secure broad- 
cast, broadcasting with side information, and index cod- 
ing fT4l, [15]). Several practical codes have been designed. 
For example, the authors of 1 11 1 propose sparse linear coset 
codes to emulate random binning and survey propagation 
to enforce broadcast channel constraints. In [12], the au- 
thors propose enumerative source coding and Luby-Transform 
codes for deterministic DM-BCs specialized to interference- 
management scenarios. Additional research includes rein- 
forced belief propagation with non-linear coding [13] . To 
our knowledge, polarization-based codes provide provable 
guarantees for achieving rates on the capacity-boundary in the 
general case. 

2) Polar Codes for Multi-User Settings: Subsequent to 
the derivation of channel polarization in [6[ and the refined 
rate of polarization in [16[, polarization methods have been 
extended to analyze multi-user information theory problems. 
In [fTOl . a joint polarization method is proposed for m- 
user MACs with connections to matroid theory. Polar codes 
were extended for several other multi-user settings: arbitrarily- 
permuted parallel channels llT7il . degraded relay channels iflSl . 
cooperative relaying fT9\, and wiretap channels []20l- ll22l . In 
addition, several binary multi-user communication scenarios 
including the Gelfand-Pinsker problem, and Wyner-Ziv prob- 
lem were analyzed in [23, Chapter 4]. Polar codes for lossless 
and lossy source compression were investigated respectively 
in dH and 1241 . In [H), source polarization was extended 
to the Slepian-Wolf problem involving distributed sources. 
The approach is based on an "onion-peeling" encoding of 
sources, whereas a joint encoding is proposed in [25]. In [9], 
a unified approach is provided for the Slepian-Wolf problem 
based on generalized monotone chain rules of entropy. To our 
knowledge, the design of polarization-based broadcast codes 
is relatively new. 

3) Binary vs. q-ary Polarization: The broadcast codes 
constructed in the present paper for DM-BCs are based on po- 
larization for binary random variables. However, in extending 
to arbitrary alphabet sizes, a large body of prior work exists 
and has focused on generalized constructions and kernels ll26l . 
and generalized polarization for q-ary random variables and 
q-ary channels l2ZJ-[i3QJ. The reader is also referred to the 
monograph in lf3Tl containing a clear overview of polarization 
methods. 



C. Notation 

An index set {1, 2, ... , m} is abbreviated as [m]. An mxn 
matrix array of random variables is comprised of variables 
Yi{j) where i G [m] represents the row and j e [n] the 
column. The notation Y^'''-'^ = {Y,{k),Y,{k + I), . . . ,Y,{i)} 
for k < £. When clear by context, the term y^" represents 
y/'". In addition, the notation for the random variable Yi{j) is 
used interchangeably with Y^ . The notation f{n) = 0{g{n)) 
means that there exists a constant k such that f{n) < Kg{n) 
for sufficiently large n. For a set S, clo(iS) represents set 
closure, and co(iS) the convex hull operation over set S. Let 
hb{x) = — a;log2(a;) — (1 — a;) log2(l — x) denote the binary 
entropy function. Let a * b = {I — a)b + a{l — b). 

II. Model 

Definition 1 (Discrete, Memoryless Broadcast Channel): 
The discrete memoryless broadcast channel (DM-BC) with 
m broadcast receivers consists of a discrete input alphabet 
X, discrete output alphabets for i G [m], and a conditional 
distribution PY^.Y^,....Y^\x{yiiy2, ■ ■ ■ ,yra\x) where x e X 
and e 3^1. 

Definition 2 (Private Messages): For a DM-BC with m 
broadcast receivers, there exist m private messages {Wi}ig[,„] 
such that each message Wi is composed of nRi bits and 
(Wi,W2, . ■ . ,Wm) is uniformly distributed over [2"^i] x 
[2"^2] X • • • X [2"^™]. 

Definition 3 (Channel Encoding and Decoding): For the 
DM-BC with independent messages, let the vector of rates 
R A [ 7?i i?2 ... R,n An {R,n) code for the 
DM-BC consists of one encoder 

: [2"-^i] X [2"-"-=] X • • ■ X [2"-"™] ^ X", 

and m decoders specified by Wi : — > [2"^'] for i G [m]. 
Based on received observations {Yi{j)}j^[n], each decoder 
outputs a decoded message Wi. 

Definition 4 (Average Probability of Error): The average 

(n) 

probability of error Pi ' for a DM-BC code is defined to be 
the probability that the decoded message at aU receivers is 
not equal to the transmitted message. 

Pi") = p I y m {{YmMn]) ^w.y 

Definition 5 (Private-Message Capacity Region): If there 
exists a sequence of {R, n) codes with Pi"' 0, then the 
rates R G M™ are achievable. The private-message capacity 
region is the closure of the set of achievable rates. 

III. Deterministic Broadcast Channels 

Definition 6 (Deterministic DM-BC): Define m determin- 
istic functions fi{x) : X ^ yi for i G [m]. The deterministic 
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Capacity Region of Blackwell Channel 
i.6r 




Fig. 1. Blackwell Channel: An example of a deterministic broadcast channel with m = 2 broadcast users. The channel is defined as Y\ = fi{X) and 
Y2 = f2{X) where the non-linear functions /i (x) = max(x — 1,0) and f2{x) = min(a;, 1). The private-message capacity region of the Blackwell channel 
is drawn. For different input distributions Px{x), the achievable rate points are contained within corresponding polyhedrons in WV;. 



DM-BC with m receivers is defined by the following condi- 
tional distribution 



P. 



Yi,Y2,...,Y„^\X 



(1) 



A. Capacity Region 

Proposition 1 (Marton [32], Pinsker [33]): The capacity 
region of the deterministic DM-BC includes those rate-tuples 
.R e MIP in the region 



€det-bc = co(clo(^ y m{X,{Y^}, 

X,{Y-,},g[,„j 

where the polyhedral region 9\{X, {i^i}ie[m]) is given by 



(2) 



(3) 



iG5 



The union in Eqn. (|2]i is over all random variables 
X,Yi,Y2, . ■ . , Y„i with joint distribution induced by Px{x) 
and Y, - f,{X). 

Example 1 (Blackwell Channel): In Figure [T] the Black- 
well channel is depicted with X = {0, 1, 2} and yi = {0, 1}. 
For any fixed distribution Px{x), it is seen that Py^ya (2/1, 2/2) 
has zero mass for the pair (1,0). Let a G |]. Due to the 
symmetry of this channel, the capacity region is the union of 
two regions, 

{{Ri,R2) : Ri < hb{a),R2 < 

Ri+R2< hbia) + a}, 

{(i?i,i?2): Ri<Ki^),R2<K{a), 
Ri+R2< hbia) + a}, 

where the first region is achieved with input distribution 
Px(0) = Px{^) — J, and the second region is achieved 
with Pxil) = Px(2) = f [I2I, Lec. 9]. The sum rate is 
maximized for a uniform input distribution which yields a 
pentagonal achievable rate region: i?i < hi,{^), R2 < ^b(^), 
Ri + R2 < log2 3. Figure [T] illustrates the capacity region. 



B. Main Result 

Theorem 1 (Polar Code for Deterministic DM-BC): 
Consider an m-user deterministic DM-BC with arbitrary 
discrete input alphabet X, and binary output alphabets 
Yi e {0,1}. Fix input distribution Px{x) where x X and 
constant < /3 < |- Let tt : [m] — [m] be a permutation on 
the index set of receivers. Let the vector 



R — [ R-nil) Rw(2) 



R 



7r(m) 



There exists a sequence of polar broadcast codes over n 
channel uses which achieves rates R where the rate for receiver 
7r(i) e [to] is bounded as 

< < H {Y^(^)\{Y^(k)}k=l■.i-l) ■ 

The average error probability of this code sequence decays as 
pi"-* = 0(2""'^). The complexity of encoding and decoding 
is 0{n\ogn). 

Remark 1: To prove the existence of low -complexity broad- 
cast codes, a successive randomized protocol is introduced in 
Section IV-AI which utilizes o(n) bits of randomness at the 
encoder A deterministic encoding protocol is also presented. 

Remark 2: The achievable rates for a fixed input distribu- 
tion Px {x) are the vertex points of the polyhedral rate region 
defined in (|3]l. To achieve non-vertex points, the following 
coding strategies could be applied: time-sharing; rate-splitting 
for the deterministic DM-BC |34|; polarization by Arikan 
utilizing generalized chain rules of entropy [9 |. For certain 
input distributions Px{x), as illustrated in Figure [T] for the 
Blackwell channel, a subset of the achievable vertex points lie 
on the capacity boundary. 

Remark 3: Polarization of channels and sources extends to 
q-ary alphabets (see e.g. |27|). Similarly, it is entirely possible 
to extend Theorem [T] to include DM-BCs with g-ary output 
alphabets. 
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IV. Overview of Polarization Method 
For Deterministic DM-BCs 

For the proof of Theorem [T] we utilize binary polarization 
theorems. By contrast to polarization for point-to-point chan- 
nels, in the case of deterministic DM-BCs, the polar transform 
is applied to the output random variables of the channel. 

A. Polar Transform 

Consider an input distribution Px{x) to the deterministic 
DM-BC. Over n channel uses, the input random variables to 
the channel are given by 

where ^ Px are independent and identically distributed 
(i.i.d.) random variables. The channel output variables are 
given by Yi{j) — fi{X{j)) where /i(-) are the deterministic 
functions to each broadcast receiver. Denote the random matrix 
of channel output variables by 



Y A 



Y? 
Yi 



^3 

Yi 



y^l 



^2 



(4) 



2 . For n = 2 and i > 1, the polar transform 
is defined as the following invertible linear transformation. 



U = YG, 
where G„ = 



(5) 



The matrix G„ G F2 ^" is formed by multiplying a matrix of 
successive Kronecker matrix-products (denoted by {^) with a 
bit-reversal matrix B„ introduced by Ankan |8 1. The polarized 



random matrix U e ^" is indexed as 



ul ul uf 






jjn 



B. Joint Distribution of Polarized Variables 

Consider the channel output distribution PYiY2 - Ym of the 
deterministic DM-BC induced by input distribution Px{x). 
The j-th column of the random matrix Y is distributed as 
(¥^,¥2,- ■ -tYJ^) ^ Py-^y^...y^. Due to the memoryless 
property of the channel, the joint distribution of all output 
variables is 



Py 



Y"Y"---Y"\yi,y2,- ■ -.y,, 



7i 



(7) 



H{Yi,)\Yl-'-\{Y,^-"},,l,.,._^i) 























I 





























































tl 



u 



if(c/Kj)|f/^-\{f/,'-W.-ii) 



Fig. 2. The polai' transform applied to random matrix Y with i.i.d. stiucture 
results in a polarized random matrix U. 



The joint distribution of the matrix variables in Y is charac- 
terized easily due to the i.i.d. structure. The polarized random 
matrix U does not have an i.i.d. structure. However, one way 
to define the joint distribution of the variables in U is via the 
polar transform equation (|5). An alternate representation is via 
a decomposition into conditional distributions as foUowfl 



P, 



U^U^---U^[Ul,U2, ■ ■ ■ 
rn n 



i=lj=l 



:{"fc"}fee[l:i-l])- 



(8) 



As derived by Ankan in ||8l and summarized in Section IIV-EI 
the conditional probabilities in ([8]l and associated likelihoods 
may be computed using a dynamic programming method 
which "divides-and-conquers" the computations efficiently. 



(6) C. Polarization of Conditional Entropies 



Proposition 2 (Polarization l^j: Consider the pair of ran- 
dom matrices (Y, U) related through the polar transformation 
in (|5]l. For i e [to] and any e e (0, 1), define the set of indices 

^ {j e [n] : 



Then in the limit as n — ^ cx) 
1 

n 



A. 



in) 



-^H{Y,\YiY2---Y,^i). 



(10) 



'The abbreviated notation of the form P{a\b) which appears in jsj indicates 
Pj^^g{a\b), i.e. the conditional probability ¥{A = a\B = b} where A and 
B are random variables. 
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For sufficiently large n, Theorem |2] establishes that there 
exist approximately nH {Yi\YiY2 ■ ■ -l^i-i) indices per row 
i G [m] of random matrix U for which the conditional entropy 
is close to 1. The total number of indices in U for which 
the conditional entropy terms polarize to 1 is approximately 
nH{YiY2 ■ ■ ■ Y„i). The polarization phenomenon is illustrated 
in Figure 12] 

Remark 4: Since the polar transform G„ is invert- 
ible, {t/^'"}fcg[i.i„i] are in one-to-one correspondence 
with {y^^ "}fcg[i:i-i]. Therefore the conditional entropies 

H{U,{j)\ul--'~\{Ul--^}ke[i:-i]) also polarize to or 1. 



D. Rate of Polarization 

The Bhattacharyya parameter of random variables is closely 
related to the conditional entropy. The parameter is useful for 
characterizing the rate of polarization. 

Definition 7 (Bhattacharyya Parameter): Let {T,V) ^ 
Pt,v where T S {0, 1} and V gV where V is an arbitrary dis- 
crete alphabet. The Bhattacharyya parameter Z{T\V) £ [0, 1] 
is defined 



Z{T\V) 



(11) 



As shown in Lemma [T6] of Appendix lAl Z{T\V) — > 1 implies 
H{T\V) 1, and similarly Z{T\V) implies H{T\V) 
for T a binary random variable. Based on the Bhattacharyya 
parameter, the following theorem specifies sets A^l""* C [n\ 
that will be called message sets. 

Proposition 3 (Rate of Polarization): Consider the pair of 
random matrices (Y, U) related through the polar transfor- 
mation in (|5]l. Fix constants 0</3<i, t>0, «G [m]. Let 
dn = 2^"** be the rate of polarization. Define the set 



M 



(n) A_ 



{] e [n] 



U. 



-\{rfcl^"W[l:,_l])>l-(5„}. 



Then there exists an No — No{(3, r) such that 



M 



(«) 



>HiY,\YiY2---Y,^i)-T, 



(12) 



(13) 



for all n> No- 

The proposition is established via the Martingale Convergence 
Theorem by defining a super-martingale with respect to the 
Bhattacharyya parameters ||6] |]8]. The rate of polarization is 
characterized by Ankan and Telatar in |fl6l . 

Remark 5: The message sets A^l""* are computed "offline" 
only once during a code construction phase. The sets do not 
depend on the realization of random variables. In the following 
Section ITV-EI a Monte Carlo sampling approach for estimating 
Bhattacharyya parameters is reviewed. Other highly efficient 
algorithms are known in the literature for finding the message 
indices (see e.g. Tal and Vardy fi5\). 



E. Estimating Bhattacharyya Parameters 

As shown in Lemma [TT] in Appendix |A] one way to 
estimate the Bhattacharyya parameter Z{T\V) is to sample 
from the distribution PT.v{t, and evaluate Kt.v \/ f{T, V). 
The function ip{t, v) is defined based on likelihood ratios 



Liv) 



Pt\v{0\v) 



L 



-If , A Pt\v{1\v) 



Pt\v{0\v)' 



Similarly, to determine the indices in the message sets 
defined in Proposition [3] the Bhattacharyya parame- 



ters Z 



, {Fj,^-"}fcg[i_i] j must be estimated effi- 
ciently. For n > 2, define the likelihood ratio 



L. 





= 




-1 Jvl-.n 


= yfc"}fcG[l:«-l]) 




= 1 




''\{Y,^- 


= yfc"}fce[l:J-l]) 



(14) 

The dynamic programming method given in [81 allows for 
a recursive computation of the likelihood ratio. Define the 
following sub-problems 



Si =L 



2] -2 

o 

2 J -2 



iVk }fcG [1:^-1] 



where the notation u]''^^ ^ and u]''^-' ^ represents the odd 
and even indices respectively of the sequence u]''^-'"'^. The 
recursive computation of the likelihoods is characterized by 



S1S2 + 1 

Si + S2 ■ 



where 7 = 1 if Ui{2j — 1) = and 7 = — 1 if Ui{2j — 1) = 
1. In the above recursive computations, the base case is for 
sequences of length n — 2. 

V. Proof Of Theorem[T] 

The proof of Theorem [T] is based on binary polarization 
theorems as discussed in Section |IV] The random coding 
arguments of C. E. Shannon prove the existence of capacity- 
achieving codes for point-to-point channels. Furthermore, ran- 
dom binning and joint-typicality arguments suffice to prove 
the existence of capacity-achieving codes for the deterministic 
DM-BC. However, it is shown in this section that there 
exist capacity-achieving polar codes for the binary-output 
deterministic DM-BC. 



6 



A. Broadcast Code Based on Polarization 

The ordering of the receivers' rates in R is arbitrary due to 
symmetry. Therefore, let 7r(i) = ihe the identity permutation 
which denotes the successive order in which the message bits 
are allocated for each receiver The encoder must map m in- 
dependent messages {Wi, W2, ■ ■ ■ , Wm) uniformly distributed 



over [2"-«i] x [2"^^] 



X [2 



nR„ 



to a codeword G X"" 



To construct a codeword for broadcasting m independent 
messages, the following binary sequences are formed at the 
encoder: u}'", 1*2'", . . . , To determine a particular bit 
Ui{j) in the binary sequence u}'-", if j G A^^""*, the bit is 
selected as a uniformly distributed message bit intended for 
receiver i G [m]. As defined in ( fT2] i of Proposition |3] the 
message set A^^"^ represents those indices for bits transmitted 
to receiver i. The remaining non-message indices in the binary 
sequence u}'-" for each user i G [m] are computed either 
according to a deterministic or random mapping. 

7 ) Deterministic Mapping: Consider a class of determinis- 
tic boolean functions indexed by i G [m] and j G [n] : 



(15) 



3) Mapping From Messages To Codeword: The binary 
sequences u^^-" for i G [m] are formed successively bit by 
bit. If j G A^i""*, then the bit Ui{j) is one message bit 
from the uniformly distributed message Wi intended for user 

in the case of a deterministic mapping, or Ui{j) — 

*i^AJvn("^'^"^{yfc"}fc6[l:^-l]) in *e case of a random 
mapping. The encoder then applies the inverse polar transform 
for each sequence: yj'^ — w^'"G^^. The codeword is 
formed symbol-by-symbol as follows: 

m 
i=l 

If the intersection set is empty, the encoder declares a block 
error A block error only occurs at the encoder 

4) Decoding at Receivers: If the encoder succeeds in 
transmitting a codeword x", each receiver obtains the sequence 
y}'" noiselessly and applies the polar transform G„ to recover 
w^-" exactly. Since the message indices A^|"^ are known to 
each receiver, the message bits in uj'-" are decoded correctly 
by receiver i. 



As an example, consider the deterministic boolean function 
based on the maximum a posteriori polar coding rule. 



arg max 

tiG{04} 



1:7-1 



(16) 



2) Random Mapping: Consider a class of random boolean 
functions indexed by i G [m] and j G [n] : 



^^^'-'^ : ^Q^i^^i'^^^{o,i-^})+3-i ^ {0,1}. 
As an example, consider the random boolean function 



(17) 



BAND 



{;^''-\{y^''}keil:^-^) " 

0, w.p. Xa{u]"^~'^,{yl'"-}ke[i:t~i]) , 

1, w.p. 1 - Ao (u]"^^^, {j/fc "}fee[i:»-i]) 



(18) 



where 



Ao (ul^'-\{yl:''}keli:.-n) 



{>^^"=2/Mfee[i.-i]) 



The random boolean function 



RAND 



may be thought of as 



a vector of Bernoulli random variables indexed by the input to 
the function. Each Bernoulli random variable of the vector has 
a fixed probability of being one or zero that is well-defined. 



B. Total Variation Bound 



While the deterministic mapping V'm^ap performs well in 

(n) 

practice, the average probability of error Pe of the coding 
scheme is more difficult to analyze in theory. The random 
mapping 5'^^)vd encoder is more amenable to analysis 

via the probabilistic method. Towards that goal, consider the 
following probability measure defined on the space of tuples 
of binary sequenceo 



ly-i 



■ • • 7/" 

"1 5 "2 I 1 "?i 



where the conditional probability measure 



;{'"fe"}fce[l:i-l])- 



(19) 




l:j-l 



fce[l:i-i; 



if J £M 
otherwise. 



in) 



The probability measure Q defined in ( fT9] l is a perturbation of 
the joint probability measure P defined in (O for the random 
variables Ui{j). The only difference in definition between P 

(n) 

and Q is due to those indices in message set Ml . The 
following lemma provides a bound on the total variation 
distance between P and Q. 

related proof technique was provided for lossy source coding based on 
polarization in a different context 1 24 1 . In the present paper, a different proof 
is supplied that utilizes the chain rule for KL-divergence. 



7 



Lemma 1: (Total Variation Bound) Let probability mea- 
sures P and Q be defined as in dHJ and (fT9] l respectively. 
Let < /3 < 1. For sufficiently large n, the total variation 
distance between P and Q is bounded as 



)-Q({4^"}feeM: 
Proof: See Section IB] of the Appendices. 



< 2- 



C. Analysis of the Average Probability of Error 

For the m-user deterministic DM-BC, an error event occurs 
at the encoder if a codeword a:" is unable to be constructed 
symbol by symbol according to the broadcast protocol de- 
scribed in Section IV-AI Define the following set consisting of 
m-tuples of binary sequences, 

m 

r ^ [{yl,y^, . . . , O : 3j e W, fl fr^ (y,(j)) = 0}. 



i=l 



(20) 



The set T consists of those m-tuples of binary output se- 
quences which are inconsistent due to the properties of the 
deterministic channel. In addition, due to the one-to-one 
correspondence between sequences u^-" and y}'"^, denote by 
T the set of m-tuples (u", , . • ■ , u^) that are inconsistent. 

For the broadcast protocol, the rate i?i = ^ | ["^ | for each 
receiver. Let the total sum rate for all broadcast receivers be 



em 



Ri. If the encoder uses a fixed deterministic 



map i/j^*'^^ in the broadcast protocol, the average probability 
of error is 



P 



in) 



1 



E 



(21) 



In addition, if the random maps are used at the encoder, 

the average probability of error is a random quantity given by 



p{r. 



n 



1 



E 



[(«r,«j,---,«s,)er] 



{"fc"}fce[ml 

-.(ti 



(22) 



Instead of characterizing Pe directly for deterministic maps, 
the analysis of leads to the following lemma. 

Lemma 2: Consider the broadcast protocol of Section IV-AI 
Let i?, = for i G [ml be the broadcast rates selected 



according to the criterion given in (IT2t in Proposition |3] Then 

for < /? < 1 and sufficiently large n. 



Er 



E 



Proof: 
1 



< 2-" 



E 



1[W 



"2 .■■■."m)e'7' 



is [m] 



|3 



< 2"" 



(23) 
(24) 
(25) 



Step (l23l l follows since the probability measure Q matches the 
desired calculation exactly. Step ( |24] | is due to the fact that the 
probability measure P has zero mass over m-tuples of binary 
sequences that are inconsistent. Step ( |25] l follows directly from 
Lemma [T] Lastly, since the expectation over random maps 
{-jf(^j)} of the average probability of error decays stretched- 
exponentially, there must exist a set of deterministic maps 
which exhibit the same behavior. ■ 

VI. Noisy Broadcast Channels 
Superposition Coding 

Coding for noisy broadcast channels is now considered 
using polarization methods. By contrast to the deterministic 
case, a decoding error event occurs at the receivers on account 
of the randomness due to noise. For the remaining sections, it 
is assumed that there exist m = 2 users in the DM-BC. The 
private-message capacity region for the DM-BC is unknown 
even for binary input, binary output two-user channels such 
as the skew-symmetric DM-BC. However, the private-message 
capacity region is known for specific classes. 

A. Special Classes of Noisy DM-BCs 

Definition 8: The two-user physically degraded DM-BC is 
a channel PY^Y2\x{yi:y2\x) for which X — Yi —Y2 form a 
Markov chain, i.e. one of the receivers is statistically stronger 
than the other: 

PYiY2\x{yi,y2\x) = PYi\x{yi\x)PY2\Yi{y2\yi)- (26) 

Definition 9: A two-user DM-BC PYxY2\x{yiTy2\x) is 
stochastically degraded if its conditional marginal distributions 



8 




Fig. 3. The special classes of noisy broadcast channels as described in 
Section IVI-AI Class / represents stochastically degraded DM-BCs. Class 
// represents broadcast channels for which V — X — (^1,^2) and 
Pv^iviyil'") >- PYilviyil"") for all Pxivi^l"")- Class // is equivalent 
to Class /. Class /// represents less-noisy DM-BCs. Class IV represents 
broadcast channels with the more capable property. 



are the same as that of a physically degraded DM-BC, i.e., if 
there exists a distribution PY2\Yi{y2\yi) such that 

PY,\x{y2\x)= J2 PY,\x{yi\x)PY,iYAy2\yi). (27) 
yieyi 

If ( [27] l holds for two conditional distributions ^Vi |x (yi and 
Py^ix {y2\x) defined over the same input, then the property is 
denoted as follows: PY^\x{yi\x) >- PY2\x{y2\x). 

Definition 10: A two-user DM-BC ^Vii2|x(yi' 2^2^) for 
which V — X ~ {Yi,Y2) forms a Markov chain is said to 
be less noisy if 



yPvxiv,x) : IiV;Yi) > I{V;Y2). 



(28) 



Definition 11: A two-user DM-BC PYiY2\x{yi^y2\x) is 
said to be more capable if 



yPx{x):I{X;Yi)>I{X;Y2). 



(29) 



The following lemma relates the properties of the special 
classes of noisy broadcast channels. A more comprehensive 
treatment of special classes is given by C. Nair in ll36l . 

Lemma 3: Consider a two-user DM-BC PYiY2\x{yij y2\x)- 
Let V ~ X - {Yi,Y2) form a Markov chain, |V| > 1, and 
Pv{v) > 0. The following implications hold: 

X-Y1-Y2 

^ PY,\x{yi\x) PY2\x{y2\x) (30) 

<=^\fPx\v{x\v) : PY,\v{yi\v) >- PY2\v{y2\v) (31) 
yPvxiv, x) : I{V: Yi) > I{V; Y2) (32) 
^yPx{x):I{X;Yi)>I{X;Y2). (33) 

The converse statements for (|30] |, (|32] |. and (l3Jt do not hold 
in general. Figure |3] illustrates the different types of broadcast 
channels as a hierarchy. 

Proof: See Section |E] of the Appendices. ■ 



B. Cover's Inner Bound 

Superposition coding involves one auxiliary random vari- 
able V which conveys a "cloud center" or a coarse message 
decoded by both receivers [ 1 1 . One of the receivers then 
decodes an additional "satellite codeword" conveyed through 
X containing a fine-grain message that is superimposed upon 
the coarse information. 

Proposition 4 (Cover's Inner Bound): For any two-user 



DM-BC, the rates (i?i,i?2) e 
1H(X, V, Y1.Y2) are achievable where 



in the region 



^{X,V,Y^,Y2) ^ {i?i,i?2 



Ri 



Ri<I{X-Yi\V), 

R2<I{V:Y2), 

R2<I{X;Yi)y 



(34) 



and where random variables X, V, Yi , Y2 obey the Markov 
chain V -X ~ (Yi,l2). 

Remark 6: Cover's inner bound is applicable for any broad- 
cast channel. By symmetry, the following rate region is also 
achievable: {i?i,i?2 | R2 < IiX;Y2\V),Ri < /(F; Fi), + 
R2 < I{X;Y2)} for random variables obeying the Markov 
chain V -X - (Yi.Fa). 

Remark 7: The inner bound is the capacity region for 
degraded, less-noisy, and more-capable DM-BCs (i.e. Class / 
through Class IV as shown in Figure |3]l. For the degraded 
and less-noisy special classes, the capacity region is sim- 
plified to {Ri,R2 I Ri < I{X;Yi\V),R2 < I{V;Y2)}. 
To see this, note that I{V;Y2) < I{V;Yi) which implies 
I{V; Y2) + I{X; Yi\V) < !{¥; Yi)+I{X; Yi\V) = J(X; Fi). 
Therefore the sum-rate constraint Ri + R2 < I{X; Yi) of the 
rate-region in (|34] | is automatically satisfied. 

Example 2 (Binary Symmetric DM-BC): The two-user bi- 
nary symmetric DM-BC consists of a binary symmetric chan- 
nel with flip probability pi denoted as BSC(pi) and a second 
channel BSC(p2)- Assume that pi < P2 < which implies 
stochastic degradation as defined in dZTl i. For a G [0,5], 
Cover's superposition inner bound is the region, 

|i?i,i?2 Ri < hb{a*pi) - hb{pi), 

R2 < I ~ hia * P2)} (35) 

The above inner bound is determined by evaluating (|34| | where 
is a Bernoulli random variable with Pv{v) = ^, X = 
V®S, and 5 is a Bernoulli random variable with Ps(l) = ct. 
Figure |4] plots this rectangular inner bound for two different 
i. The corner points of this rectangle 
lie on the capacity boundary. 



values ct = jq and a 



given m 

Example 3 (DM-BC with BEC(e) and BSCfpj fWl): 
Consider a two-user DM-BC comprised of a BSC(p) from 
X to Yi and a BEC(e) from X to Y2. Then it can be shown 
that the following cases hold: 
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• < e < 2p: Yi is degraded with respect to ¥2- 

• 2p < e < 4p(l — p): Y2 is less noisy than Yi but Yi is 
not degraded with respect to l2- 

• 4p(l ~ p) < e < hh{p): Y2 is more capable than Yi but 
not less noisy. 

• hi,{p) < e < 1: The channel does not belong to the 
special classes. 

The capacity region for all channel parameters for this example 
is achieved using superposition coding. 

C. Main Result 

Theorem 2 (Polarization-Based Superposition Code): 
Consider any two-user DM-BC with binary input alphabet 
X — {0,1} and arbitrary output alphabets 3^i, 3^2- There 
exists a sequence of polar broadcast codes over n channel 
uses which achieves the following rate region 

^{V,X,Yi,Y2) ^ [Ri,R2 I Ri < I{X;Yi\V), 

R2<I{V;Y2)}, (36) 

where random variables V, X, Yi , Y2 have the following listed 
properties: 

• is a binary random variable. 

• PYi\v{yi\v) ^ PY2\v{y2\v). 

• V - X ~ {Yi,Y2) form a Markov chain. 

For < /3 < the average error probability of this code 
sequence decays as Pi"' = 0(2^"'^). The complexity of 
encoding and decoding is 0{nlogn). 

Remark 8: The requirement that auxiliary F is a binary 
random variable is due to the use of binary polarization 
theorems in the proof. Indeed, the auxiliary V may need to 
have a larger alphabet in the case of broadcast channels. An 
extension to q-ary random variables is entirely possible if q-ary 
polarization theorems are utilized. 

Remark 9: The requirement that V — X — {Yi,Y2) holds 
is standard for superposition coding over noisy channels. 
However, the listed property PYi\v{yi\v) >- PY2\v{y2\v) is 
due to the structure of polarization and is used in the proof 
to guarantee that polarization indices are aligned. If both 
receivers are able to decode the coarse message carried by the 
auxiliary random variable V, the polarization indices for the 
coarse message must be nested for the two receivers' channels. 

VII. Proof of Theorem|2] 

The block diagram for polarization-based superposition 
coding is given in Figure |5] Similar to random codes in 
Shannon theory, polarization-based codes rely on n-length 
i.i.d. statistics of random variables; however, a specific po- 
larization structure based on the chain rule of entropy allows 
for efficient encoding and decoding. The key idea of Cover's 
inner bound is to superimpose two messages of information 
onto one codeword. 



Capacity Region: pi 




Fig. 4. DM-BC with BSd: The classic two-user broadcast cliarmel consisting 
of a BSC(pi = iJq) and a BSC(p2 = -[q)- Tlie private-message capacity 
region is equivalent to tlie superposition coding inner bound. For a fixed 
auxiliary and input distribution Pyx{i},x), the superposition inner bound 
is plotted as a rectangle in for a = and q = ^ as described in 
Example |2] For this example, polar codes achieve all points on the capacity 
boundary. 

A. Polar Transform 

Consider the i.i.d. sequence of random variables 
{V^,X^,Yl,Yi) ^ Pv[v)Px\v{x\v)PY,Y2\x{yi,y2\^) 
where the index j e [n]. Let the n-length sequence of 
auxiliary and input variables {V^ ^X^) be organized into the 
random matrix 



X^ X2 X3 
yl y2 yZ 



yn 



(37) 



Applying the polar transform to ft results in the random matrix 
U = r2G„. Let the random variables in the random matrix U 
be indexed as follows: 



U 



ul ul ul 
ul c/| c/| 



(38) 



The above definitions are consistent with the block diagram 
given in Figure |5] (and noting that G„ = G^^). The polar 
transform extracts the randomness of fJ. In the transformed 
domain, the joint distribution of the random variables in U is 
given by 



Pi 



)^Px-v^ 



(u"G„,u2G„) 



(39) 
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Fig. 5. Block diagram of a polarization-based superposition code for a two-user noisy broadcast chaimel. 



For polar coding purposes, the joint distribution is decomposed 
as follows, 

n 

= l[P{u2{j)\u',=-')P{u^{j)\u',-^-\u^^). (40) 
j=i 

The conditional distributions may be computed efficiently 
using recursive protocols as already mentioned. The polarized 
variables in U are not i.i.d. random variables. 

B. Polarization Theorems Revisited 

Definition 12 (Polarization Sets for Superposition Coding): 
Let V^", X", Y]", be the sequence of random variables as 
introduced in Section IVII-AI In addition, let C/f = X"G„ 
and f/2" = V""G„. Let (5„ = 2""^' for < /3 < i. The 



following polarization sets are defined: 



-!/(") 

^X\V 






e 


[n] 


: Z 


in) 

X\VYi 


_A 


\j 


e 


[n] 


: Z 


'-V\Yt 


_A 


V 


e 


[n] 


: Z 


tiy 


_A 


\3 


e 


[n] 


: Z 


'~'V\Y2 




V 


e 


[n] 


: Z 



(Uiij) 



Definition 13 (Message Sets for Superposition Coding): 
In terms of the polarization sets given in Definition [12] the 
following message sets are defined: 



. A •T/(") n 

■'^Iv - rty n -i-y|Yi> 
^(n) A ^(r.) ^ ^(n) 



X\V ' ' '~'X\VYi- 



(41) 
(42) 
(43) 



Proposition 5 (Polarization): Consider the polarization sets 
given in Definition [12] and the message sets given in Defini- 
tion [T3] with parameter (5„ — 2^" for < /3 < i. Fix a 
constant r > 0. Then there exists an No = No{f3, r) such that 
1 



M^^^ > [H{X\V)-H{X\V,Yi)yT, (44) 
> (HiV)-H{V\Y2))-T, (45) 



for all n> No- 
Lemma 4: Consider the message sets defined in Defini- 
tion[T3] If the property Py^iviyilv) >- PY2\v{y2\v) holds for 
conditional distributions PY-^\v{yi\v) and Py2|y(y2|w), then 
the Bhattacharyya parameters 

Z {u2{j)\u^'-'-\Yr) < Z {U2{J)\U^-'~\Y-) 
for all j G [n]. As a result. 

Proof: The proof follows from Lemma [12] and repeated 
application of Lemma [T3] in Appendix [Al ■ 



C. Broadcast Encoding Based on Polarization 

The polarization theorems of the previous section are 
useful for defining a multi-user communication system as 
diagrammed in Figure [5] The broadcast encoder must map two 
independent messages (M^i,VF2) uniformly distributed over 
pnfl'i] X [2"^2] to a codeword a;" G X" in such a way 
that the decoding at each separate receiver is successful. The 
achievable rates for a particular block length n are 



1 

Ri = - 

n 
n 



To construct a codeword, the encoder first produces two 
binary sequences it" G {0,1}" and G {0,1}". To deter- 
mine ui{j) for j G A^i"', the bit is selected as a uniformly 
distributed message bit intended for the first receiver To 
determine U2{j) for j G ■M^2^\ the bit is selected as a 
uniformly distributed message bit intended for the second 
receiver The remaining non-message indices of w" and U2 
are computed according to deterministic or random functions 
which are shared between the encoder and decoder 

1) Deterministic Mapping: Consider the following deter- 
ministic boolean functions indexed by j G [n] : 



if) 



4'^ 



{0,1}"+^-! ^{0,1}, 
{0,i}J-i^{0,l}. 



(46) 
(47) 
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As an example, consider the deterministic boolean functions 
based on the maximum a posteriori polar coding rule. 



mxfp(c/i(j) 
,1} ^ 



arg max 

«G{0 



t/^ — u 



(i) 



arg max 

«G{0 



mx|p(c/2 0-) 
,1} ^ 



U. 



l:j-l _ 



4-) y 



(48) 



(49) 



2) Random Mapping: Consider the following class of ran- 
dom boolean functions indexed by j e [n] : 



5- 



{0,1}"+^-! ^{0,1}, 
{0,1}^-!-^ {0,1}. 



As an example, consider the random boolean functions 



0, w.p. Ao(mi'^ 



w.p. 



Ao(ui 



l:j-l 



, V" 



0, w.p. Ao(m2'^ ), 

1, w.p. 1 — Aq (Uj"'"^) 



(50) 
(51) 



(52) 
(53) 



where 



P{Ui{j)^0\ul 



The random boolean functions ^P]^^ and vj/^-'^ may be thought 
of as a vector of independent Bernoulli random variables 
indexed by the input to the function. Each Bernoulli random 
variable of the vector is zero or one with a fixed probability. 

3) Protocol: The encoder constructs the sequence itj first 
using the message bits W2 and either i49[ or (|53] |. Next, the 
sequence u" = M2G„ is created. Finally, the sequence u" is 
constructed using the message bits Wi, the sequence v''\ and 
either the deterministic maps defined in (l4Ft or the randomized 
maps in (|52] |. The transmitted codeword is x" = w"G„. 



D. Broadcast Decoding Based on Polarization 

1) Decoding At First Receiver: Decoder Vi decodes the 
binary sequence 112 first using its observations y". It then 
reconstructs v" = UjGn. Using the sequence w" and obser- 
vations y", the decoder reconstructs u". The message Wi is 



functions: 



iax|p(c/2(j) 



arg max'j 

«G{0,1} 



^ ^ tie{o,i} 

P(t/i(j) =w 



(54) 



(55) 



The decoder 2?i reconstructs {t, bit-by-bit successively as 
follows using the identical shared random mapping ^2 
(or possibly the identical shared mapping V'2"''') ^^ed at the 
encoder; 



Vi 



otherwise. 



(56) 



If Lemma |4] holds, note that C X^"'. With mJ, 

decoder 2?i reconstructs = WjGn. Then the sequence 
Ui is constructed bit-by-bit successively as follows using 
the identical shared random mapping ^I*-^ (or possibly the 
identical shared mapping ipi^) used at the encoder: 

Mj) = { V i-,-i \ (57) 
pI'-j^^M M]^'-' otherwise. 



2) Decoding At Second Receiver: The decoder ^2 decodes 
the binary sequence using observations 2/2 ■ The message 
W2 is located at the indices j 6 A^2"' °f '^^e sequence -Uj- 
More precisely, define the following polar decoding functions 



{4'-\y2) = 

mxfp(c/2(j) 
,1} ^ 



arg max 

«G{0 



u. 



1 ^2 



(58) 



The decoder 2?2 reconstructs ^2 bit-by-bit successively as 

follows using the identical shared random mapping ^^^^ 

(i) 

(or possibly the identical shared mapping ip2 ) used at the 
encoder: 



e M 



(") 



otherwise. 



(59) 



Remark 10: The encoder and decoders execute the same 

protocol for reconstructing bits at the non-message indices. 

(J) 



located at the indices j e M[^^ in the sequence w?. More This is achieved by applying the same deterministic maps V}' 



precisely, define the following deterministic polar decoding 



and 'ip2^ or randomized maps and ^'2"'^ 
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E. Total Variation Bound 



In) 

To analyze the average probability of error via the 

probabilistic method, it is assumed that both the encoder and 
decoder share the randomized mappings and Define 
the following probability measure on the space of tuples of 
binary sequences. 



i-.j-i 



(60) 



In ( l60l l. the conditional probability measures are defined as 




l:j-l 



l:j-l 



,^2 



Otherwise. 



if3eM^;'\ 

otherwise. 



The probability measure Q defined in ( l60l l is a perturbation 
of the joint probability measure P^/nf/n (u", 1*2 ) in (gOl). The 
only difference in definition between P and Q is due to those 
indices in message sets A^^"-* and A^2"^- The following lemma 
provides a bound on the total variation distance between P 
and Q. The lemma establishes the fact that inserting uniformly 



distributed message bits in the proper indices A4 \ and A4 
at the encoder does not perturb the statistics of the n-length 
random variables too much. 

Lemma 5: (Total Variation Bound) Let probability mea- 
sures P and Q be defined as in (|40] | and (|60] | respectively. 
Let < (3 < 1. For sufficiently large n, the total variation 
distance between P and Q is bounded as 



u5'G{0,l}" 



Pu^Ui^{u'l,u^)~Q{u'l,u^) 



< T 



Proof: See Section |C] of the Appendices. 



F. Error Sequences 

The decoding protocols for T>i and I?2 were established in 
Section IVII-DI To analyze the probability of error of succes- 
sive cancelation (SC) decoding, consider the sequences u" and 
1*2 formed at the encoder, and the resulting observations y" 
and 2/2 received by the decoders. It is convenient to group the 
sequences together and consider all tuples (u", ^2,2/", 2/2 )■ 



Decoder Vi makes an SC decoding error on the j-th bit for 
the following tuples: 



yi)< 



Ul\ul''-W"Yi" 



V"Y- 



{u,ij)(Bl\ul-'-\ul^G,,,y^)}. (61) 



The set Tf^ represents those tuples causing an error at Pi 
in the case U2{j) is inconsistent with respect to observations 
Ul and the decoding rule. The set Tf* represents those tuples 
causing an error at Vi in the case is inconsistent with 

respect to v" = ui^Gn, observations y", and the decoding 
rule. Similarly, decoder 1)2 makes an SC decoding error on 
the j-th bit for the following tuples: 



P 



U2\Ui''-'Y. 



,2/2)< 



p 



2 '2 J 



The set T2 represents those tuples causing an error at V2 in 
the case U2{j) is inconsistent with respect to observations j/j 
and the decoding rule. Since both decoders T>i and P2 only 
declare errors for those indices in the message sets, the set of 
tuples causing an error is 



Tiv — 



u 



'lui 



Ti^ U V, 

U 

The complete set of tuples causing a broadcast error is 

r^ri.uTiuTs. 



(62) 
(63) 
(64) 

(65) 



The goal is to show that the probability of choosing tuples of 
error sequences in the set T is small under the distribution 
induced by the broadcast code. 



G. Average Error Probability 

Denote the total sum rate of the broadcast protocol as Ry: = 
i?i + i?2. Consider first the use of fixed deterministic maps 
ipl and shared between the encoder and decoders. Then 
the probability of error of broadcasting the two messages at 
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rates Ri and i?2 is given by 



E 



) / 71 n I n n\ 

' 1 ^2 \^1^2 

2^ n 



je[n]:j(^M^^ 



JG„)=«i(j)] 



If the encoder and decoders share randomized maps 5'^" 
and ^2 ' '^hen the average probabiUty of error is a random 
quantity determined as follows 



E 

n i[*o)(„i--i)=„,o-)] 



2n-R2 



n 



iG[n]:j^A^i" 



[*0'(«J^^^\u5G,.)=ni(j)] 



By averaging over the randomness in the encoders and de- 
coders, the expected block error probabihty Pi") [{^[^^ , ^^^^ }] 
is upper bounded in the following lemma. 

Lemma 6: Consider the polarization-based superposition 
code described in Section IVII-CI and Section IVII-DI Let 
Pi and P2 be the broadcast rates selected according to 
the Bhattacharyya criterion given in Proposition |5] Then for 
< /3 < 1 and sufficiently large n. 



(i) ,T((-J')i 



Proof: See Section |C] of the Appendices. ■ 
If the average probability of error decays to zero in expectation 
over the random maps {^'i"'''} and {^2''''}' there must 
exist at least one fixed set of maps for which pj""* — > 0. 



VIII. Noisy Broadcast Channels 
Marton's Coding Scheme 

A. Marton 's Inner Bound 

For general noisy broadcast channels, Marton's inner bound 
involves two correlated auxiliary random variables Vi and 
V2 lEI. The intuition behind the coding strategy is to identify 
two "virtual" channels, one from Vi to Yi, and the other 
from V2 to Y2. Somewhat surprisingly, although the broadcast 
messages are independent, the auxiliary random variables Vi 
and V2 may be correlated to increase rates to both receivers. 



While there exist generalizations of Marton's strategy, the 
basic version of the inner bound is presented in this sectioi|^ 

Proposition 6 (Marton's Inner Bound): For any two-user 
DM-BC, the rates (Pi,P2) G in the pentagonal region 
d\[X, Vi, V2,Yi,Y2) are achievable where 



Pi, P2 



Ri 



Ri<IiVi;Y,) 
R2<I{V2;Y2) 
R2<I{Vi;Yi) 



I{V2;Y2)~I{Vr,V2)]. 



(66) 



and where X, Vi, V2, Yi, Y2 have a joint distribution given by 

PvtV2{vi,V2)Px\ViV2iAvi,V2)PYiY2\x{yi,y2\x). 

Remark 11: It can be shown that for Marton's inner 
bound there is no loss of generality if Pxiy^yj (a;|vi, 1)2) 
1 



' x=ct>(y-i_ ,V2)\ 



where (/>(ui,U2) is a deterministic function [|2] 
Section 8.3]. Thus, by allowing a larger alphabet size for the 
auxiliaries, X may be a deterministic function of auxiliaries 
(Vi, V2). Marton's inner bound is tight for the class of semi- 
deterministic DM-BCs for which one of the outputs is a 
deterministic function of the input. 



B. Main Result 

Theorem 3 (Polarization-Based Marton Code): Consider 
any two-user DM-BC with arbitrary input and output 
alphabets. There exist sequences of polar broadcast codes 
over n channel uses which achieve the following rate region 



m{VuV2,X,Yi,Y2) ^ 



Pi, P2 



Ri <I{VuYi), 

R2<I{V2;Y2)^I{Vi-V2)], (67) 



where random variables Pi , V2 , X, Yi , Y2 have the following 
listed properties: 

• Vi and V2 are binary random variables. 

• For a deterministic function (f) : {0, 1}^ — > X, the joint 
distribution of all random variables is given by 

PviV2XYiY2{vi,V2,X,yi,y2) = 

PVtV2{vi,V2)l[x=4,{vt,V2)]PYtY2\x{Vl,y2\x). 

For < /3 < ^, the average error probability of this code 
sequence decays as p]"-* = 0{2~^^). The complexity of 
encoding and decoding is 0{n\ogn). 

Remark 12: The listed property Py2\Vo{V2\v2) >~ 
1^2(^1 1^2) is necessary in the proof due to polarization- 
based codes requiring an alignment of polarization indices. 

^In addition, it is difficult even to evaluate Marton's inner bound for general 
channels due to the need for proper cardinality bounds on the auxiliaries [37]. 
These issues lie outside the scope of the present paper. 
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Fig. 6. Block diagram of a polarization-based Marion code for a two-user noisy broadcast channel. 



The property is a natural restriction since it also implies that 
I{y2]V2) > I{Vi;V2) so that R2 > 0. However, certain joint 
distributions on random variables are not permitted using the 
analysis of polarization presented here. It is not clear whether 
a different approach obviates the need for an alignment of 
indices. 

Remark 13: By symmetry, the rate tuple (i?i,i?2) — 
{I{Vi;Yi) - I{Vi;V2)J{V2,Y2)) is achievable with low- 
complexity codes under similar constraints on the joint dis- 
tribution of Vi,V2, X, Yi,Y2. The rate tuple is a comer point 
of the pentagonal rate region of Marlon's inner bound given 
in (l66l l. 

IX. Proof of Theorem[3] 

The block diagram for polarization-based Marton coding 
is given in Figure |6] Marton's strategy differs form Cover's 
superposition coding with the presence of two auxiliaries and 
the function (j){vi,V2) which forms the codeword symbol-by- 
symbol. The polar transform is applied to each n-length i.i.d. 
sequence of auxiliary random variables. 



sequence 

^3 Y3 V3 V3-^ 



of 



ran- 



A. Polar Transform 

Consider the i.i.d. 
dom variables {¥( , ,Yl ,Y^ _ 

PvtV2{vi,V2)Px\ViV2{Avi,V2)PYiY2\x{yi,y2\x) where 
the index j G [n\. For the particular coding strategy analyzed 
in this section, Px\ViV2{A'"i^'"2) = ^x^^ivi.vi)]- Let the n- 
length sequence of auxiliary variables {V( ^¥2) be organized 
into the random matrix 



v^^ 

vi 



V7 



(68) 



Applying the polar transform to $7 results in the random matrix 
U = OG„. Index the random variables of U as follows: 



U 



ul ul ul 
C/1 c/| 



us 



The above definitions are consistent with the block diagram 
given in Figure |6] (and noting that G„ = G^^). The polar 
transform extracts the randomness of O. In the transformed 



domain, the joint distribution of the variables in U is given 
by 



Pc/rc/? «, ^2)= Pv,-v^- «G„, wJG, 



(70) 



However, for polar coding purposes, the joint distribution is 
decomposed as follows, 

Pu^u^ «, u^^)^ Pu^ {u'l)Pu^\u^ KK) 

n 

= YIP{U,{J)\U\^-')P{U2{])\U'2''\U-). (71) 

i=i 

The above conditional distributions may be computed effi- 
ciently using recursive protocols. The polarized random vari- 
ables of U do not have an i.i.d. distribution. 



B. Effective Channel 

Marton's achievable strategy establishes virtual channels 
for the two receivers via the function <l){vi,V2). The virtual 
channel is given by 



p: 



YiY2\viV2 [y^^y^ 



Wl,U2) ^ PYiY2\x(yi,y2 (/>(-Ul,W2)) 



Due to the memoryless property of the DM-BC, the effective 
channel between auxiliaries and channel outputs is given by 



n 

n ^Y, Y2\x{yi{i),y2ii) <l){vi{i),V2{i)) 



The polarization-based Marton code establishes a different 
effective channel between polar-transformed auxiliaries and 
the channel outputs. The effective polarized channel is 



P 



2/2 



y^y2" 



<G„,<G„ . (72) 



(69) C. Polarization Theorems Revisited 



Definition 14 (Polarization Sets for Marton Coding): Let 
Vi, V^, X", F", Y2 be the sequence of random variables as 
introduced in Section HX-AI In addition, let [/" = V7'G„ and 
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S„ < Z (u2{j]\u^'-^~\ Vi") < 1 - (5„ 



z (u2{j)\ui'-'-\v,") <s, 



z(U2um 



z (u2U)\ui'-'-\v{') >1-S„, 




Fig. 7. The alignment of polarization indices for Marton coding over noisy broadcast channels with respect to the second receiver. The message set A^j 
is highlighted by the vertical red rectangles. At finite code length n, exact alignment is not possible due to partially-polarized indices pictured in gray. 



(n) 



c 



(n) A 



n 



c 



c 



Vi\Yi - 
(n) A 

V2\Vi 

(") A 
V2IV1 " 

(n) A 
(") A 

V2\Y2 



= V^Gn- Let 6n = 2-"'' for < /3 < i. The following 
polarization sets are defined: 

: Z {U^if} Ul'-'-') >1-S„], 

■Z{U2{j) 

■ Z (C/2(J) 
■Z{U2{]) 

■ Z {U2{]) 



{3 e 
{i e 
{i e 
{3 e 
{.?■ e 
{3 e 



u. 



,^1 



Definition 15 (Message Sets for Marton Coding): In terms 
of the polarization sets given in Definition [141 the following 
message sets are defined: 



(n) A -T/(") 



VilVi' 



V2IV1 



n/: 



(") 
V2I12 



(73) 
(74) 



Proposition 7 (Polarization): Consider the polarization sets 
given in Definition [14] and the message sets given in Defini- 
tion [15] with parameter 5n — 2^"'^ for < /3 < i. Fix a 
constant r > 0. Then there exists an No = No{f3, r) such that 

1 



M["'>\ > (^H{Vi) ~ H{Vi\Y,)yT, (75) 
[h{V2\Vi) - HiV2\Y2))-T, (76) 



for all n > No- 
Lemma 7: Consider the polarization sets defined in Propo- 
sition [7] If the property 1^2(^2 1^2) ^Vi 1^2 ("1^2) holds 
for conditional distributions ^V2 1^2(2/2 1 ''^2) and iVi 11/2(^1 1 ''^2)^ 



then /(V2; Y2) > I{Vi; V2) and the Bhattacharyya parameters 



U2{3)\U'2'-\Y^^) < Z {U2{3) 



U. 



l:j-l 



for all j G [n]. As a result. 



V2\Vl - '~'V2\Y2' 



''V2|y2 - Vsivr 
Proof: The proof follows from Lemma [12] and repeated 
application of Lemma [13] in Appendix [A] ■ 
Remark 14: The alignment of polarization indices charac- 
terized by Lemma|7]is diagrammed in Figure ]7] The alignment 
ensures the existence of polarization indices in the set jW"2^ 
for the message W2 to have a positive rate R2 > 0. The 

(n) 

indices in represent those bits freely set at the broadcast 

encoder and simultaneously those bits that may be decoded by 
T>2 given its observations. 

D. Partially-Polarized Indices 

As shown in Figure [7] for the Marton coding scheme, exact 
alignment of polarization indices is not possible. However, the 
alignment holds for all but o{n) indices. The sets of partially- 
polarized indices shown in Figure [7] are defined as follows. 

Definition 16 (Sets of Partially-Polarized Indices): 



Ai^N\«Lu/: 



,(n) 



V2\Vi ^ '-VilVj' C^"^) 
A2^N\«Uu4^|^J. (78) 

As implied by Arikan's polarization theorems, the number of 
partially-polarized indices is negligible asymptotically as n ^ 
00. For an arbitrarily small 77 > 0, 

lAi U Az' 



< 



(79) 
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for all n sufficiently large enough. As will be discussed, 
providing these o(n) bits as "genie-given" bits to the decoders 
results in a rate penalty; however, the rate penalty is negligible 
for sufficiently large code lengths. 



E. Broadcast Encoding Based on Polarization 

As diagrammed in Figure |6] the broadcast encoder must 
map two independent messages (W^i,W2) uniformly dis- 
tributed over [2"^i] x [2"^=^] to a codeword a;" e in such 
a way that the decoding at each separate receiver is successful. 
The achievable rates for a particular block length n are 

1 



where 



Ri 

n 



M 



in) 



To construct a codeword, the encoder first produces two 
binary sequences u" e {0,1}" and G {0,1}". To deter- 
mine ui{j) for j G the bit is selected as a uniformly 
distributed message bit intended for the first receiver To 
determine U2{j) for j £ ■M2^\ the bit is selected as a 
uniformly distributed message bit intended for the second 
receiver The remaining non-message indices of m" and U2 are 
decided randomly according to the proper statistics as will be 
described in this section. The transmitted codeword is formed 
symbol-by-symbol via the function. 



where w" 



\fj e [n] ■.x{j) 
m"G„ and 



0(«l(j),«2O')) 



A valid codeword 



sequence is always guaranteed to be formed unlike in the case 
of coding for deterministic broadcast channels. 

1) Random Mapping: To fill in the non-message indices, 
we define the following random mappings. Consider the 
following class of random boolean functions where j G [n]: 



*i^^{0,l}^"-i^{0,l}, 
vI/(^^{0,l}"+^"-i^{0,l}, 
r:[n]^{0,l}. 



(80) 

(81) 
(82) 



More concretely, we consider the following specific random 
boolean functions based on the statistics derived from polar- 
ization methods: 



r(j) 





(». 


w.p. 


Ao 




V- 


w.p. 


1 - 






W.p. 


Ao 






W.p. 


1 - 




fo. 


W.p. 


1 

2 ' 






w.p. 


1 

2 ' 



l:j-l 



l-.j-l 



(83) 



(84) 



(85) 



\o (ul 



l:j-l 



{U2{j) - 



U. 



l:j-l _ l:j 



For a fixed j G [11], the random boolean functions '^^^\ "^2'' 
may be thought of as a vector of independent Bernoulli random 
variables indexed by the input to the function. Each Bernoulli 
random variable of the vector is zero or one with a fixed well- 
defined probability that is efficiently computable. The random 
boolean function F may be thought of as an n-length vector 
of Bemoulli(i) random variables. 

2) Encoding Protocol: The broadcast encoder constructs 
the sequence it" bit-by-bit successively, 

Wi message bit, ;/ j G A^i"\ 
otherwise. 



(86) 



The encoder then computes the sequence u" = UiG„. To 
generate v"!], the encoder constructs the sequence U2 (given 
w") as follows. 



W2 message bit, 



if 3^M^2\ 
otherwise. 



(87) 



Then the sequence = ^2 G„. The randomness in the above 
encoding protocol over non-message indices ensures that the 
pair of sequences (li", ) has the correct statistics as if drawn 
from the joint distribution of {Ui,U2). In the last step, the 
encoder transmits a codeword a;" formed symbol-by-symbol: 
^U) = 't'{vi{j),V2{j)) for all j G [n]. For j G A2, where 
A2 is the set of partially-polarized indices defined in (fTFt . the 
encoder records the realization of U2{j). These indices will be 
provided to the second receiver's decoder 2?2 as "genie-given" 
bits. 

F. Broadcast Decoding Based on Polarization 

1) Decoding At First Receiver: Decoder 2?i decodes the 
binary sequence w" using its observations y". The message 
Wi is located at the indices j G A^^""* in the sequence m". 
More precisely, we define the following deterministic polar 
decoding function for the j-th bit: 

(^i'^"\yr) =argmax| 
^ ^ ue{QS} 

F {UiU) = u\ul--'-' u]:-''\Y," = yi") }. (88) 

Decoder Vi reconstructs -u" bit-by-bit successively as follows 
using the identical random mapping "^1^ at the encoder: 



M.i) 



^0) 



if j G M\ 
otherwise. 



(«) 



(89) 
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Given that all previous bits u\'-' ^ have been decoded cor- 
rectly, decoder 2?i makes a mistake on the j-th bit ui{j) only 
if j e A^i"^ For the remaining indices, the decoder produces 
the same bit produced at the encoder due to the shared random 
maps. 

2) Decoding At Second Receiver: The decoder ^2 decodes 
the binary sequence using observations 1/2- The message 
W2 is located at the indices j e A^2"' of '^^e sequence 
Define the following deterministic polar decoding functions 



y2 = 



arg max 
iie{o 



laxjp f 
,1} ^ 



U2{f) 



u. 



(90) 



Decoder 2?2 reconstructs bit-by-bit successively as follows 
using the identical shared random mapping F used at the 
encoder. Including all but o{n) of the indices. 



r(j), 



if 7 G ■H^"^ 



(91) 



For those indices j G A2 where A2 is the set of partially- 
polarized indices defined in ( |78] l, the decoder 2?2 is provided 
with "genie-given" bits from the encoder Thus, all bits are 
decoded, and 2?2 only makes a successive cancelation error for 

(n) 

those indices j £ ^v-2\Y2' Communicating the genie-given bits 
from the encoder to decoder results in a rate penalty. However, 
since the number of genie-given bits scales asymptotically as 
o(n), the rate penalty can be made arbitrarily small. 

Remark 15: It is notable that decoder 'V2 reconstructs tij 
using only the observations 2/2 ■ the encoder, the sequence 
U2 was generated with the realization of a sequence as 
given in (ISTT i. However, decoder 2?2 does not reconstruct 
the sequence u". From this operational perspective, Marton's 
scheme differs crucially from Cover's superposition scheme 
because there does not exist the notion of a "stronger" receiver 
which reconstructs all the sequences decoded at the "weaker" 
receiver 



G. Total Variation Bound 



To analyze the average probability of error P'e"', It IS as- 
sumed that both the encoder and decoder share the randomized 



mappings ^'2"'^ and F (where ^2' is not utilized at 

decoder X'2)- Define the following probability measure on the 
space of tuples of binary sequences. 



U) 



g«,u«)^Q«)g«|w'20 

n 



(92) 



where the conditional probability measures are defined as 




l:j-l 



, otherwise. 



1:1-1 ri\ A 
U2 



if 1 e -H^"? 



^U2(j) U2"' otherwise. 



The probability measure Q defined in ( |92] l is a perturbation 
of the joint probability measure P^/n^/n (u", ) in dTTI ). The 
only difference in definition between P and Q is due to those 



(note: TW^"^ C 



indices in message sets A^J""* and ^Sy^ 

(n) 

vJlVi^- following lemma provides a bound on the total 
variation distance between P and Q. The lemma establishes 
the fact that inserting uniformly distributed message bits in 
the proper indices A^^"-* and ^^2"'' (^^ entire set T~(-^J^y_^) 
at the encoder does not perturb the statistics of the n-length 
random variables too much. 

Lemma 8: (Total Variation Bound) Let probability mea- 
sures P and Q be defined as in (fTTl ) and ( |92l ) respectively. 
Let < /3 < 1. For sufficiently large n, the total variation 
distance between P and Q is bounded as 



"re{o,i}" 
u5g{o,i}" 



< 2" 



Proof: Omitted. The proof follows via the chain rule 
for KL-divergence and is identical to the previous proofs of 
Lemma [T| and Lemma |5] ■ 

H. Error Sequences 

The decoding protocols for 2?i and T>2 were established in 
Section ITX-FI To analyze the probability of error of successive 
cancelation (SC) decoding, consider the sequences u" and 
formed at the encoder, and the resulting observations y" and 
2/2 received by the decoders. The effective polarized channel 
-Pyriy^ij/nj/n (j/i , 2/2 I""" 1 ^^2) ^as dcfincd in ( f72b for a fixed 
(p function. It is convenient to group the sequences together 
and consider all tuples (u", M2 , y", 2/2 )■ 

Decoder Vi makes an SC decoding error on the j-th bit for 
the following tuples: 



p 



P, 



ui\u^- 



' 1 

-v,"("i(j)©ih}^'"\yr)}- 



(93) 



The set represents those tuples causing an error at Vi in 
the case ui{i) is inconsistent with respect to observations y" 
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and the decoding rule. Similarly, decoder 2?2 makes an SC 
decoding error on the j-th bit for the following tuples: 



P 



/ I 1:7-1 n\ ^ 

2 

,i:,-i^„(«2©i|4'"\y2)}- 



U2 ui 



p 



U2 U' 



The set Tf' represents those tuples causing an error at ^2 in 
the case U2O) is inconsistent with respect to observations 
and the decoding rule. The set of tuples causing an error is 



ri^ U 

r^^ U 
r = riur2. 



(94) 
(95) 
(96) 



The goal is to show that the probability of choosing tuples of 
error sequences in the set T is small under the distribution 
induced by the broadcast code. 

/. Average Error Probability 

If the encoder and decoders share randomized maps ^S"'^ 
^2 ' ^nd r, then the average probability of error is a random 
quantity determined as follows 

pin) Ji^w^^w^r} = 



E 

je[n]:ji^M["^ 



1 



n 



1 



[r(j)=u2U)] 



n 



1 



*<^'(u^^^"\«rG,.)=«2(i)] 



By averaging over the randomness in the encoders 
and decoders, the expected block error probability 
Pi"^[{*i^\*2^^},r] is upper bounded in the following 
lemma. 

Lemma 9: Consider the polarization-based Marton code 
described in Section IIX-EI and Section IIX-FI Let i?i and R2 
be the broadcast rates selected according to the Bhattacharyya 
criterion given in Proposition Q Then for < (3 < 1 and 
sufficiently large n, 



-'{*<^\*<^',r} 



< 2-" 



Proof: See Section |D] of the Appendices. 



If the average probability of block error decays to zero in 
expectation over the random maps {^I'J"'''}, {^'2"'''}, and F, 
then there must exist at least one fixed set of maps for which 
pi"'' — 0. Hence, polar codes for Marton's inner bound exist 
under suitable restrictions on distributions and they achieve 
reliable transmission according to the advertised rates (except 
for a small set of o{n) polarization indices as is discussed 
next). 

J. Rate Penalty Due to Partial Polarization 

Lemma |9] is true assuming that decoder 2?2 obtains "genie- 
given" bits for the set of indices A2 defined in (|78l l. The 
set A2 represents those indices that are partially-polarized 
and which cause a slight misalignment of polarization indices 
in the Marton scheme. Fortunately, the set A2 contains a 
vanishing fraction of indices: ;i|A2|< 77 for 77 > arbitrarily 
small and n sufficiently large. Therefore, a two-phase strategy 
suffices for sending the "genie-given" bits. In the first phase 
of communication, the encoder sends several n-length blocks 
while decoder 2?2 waits to decode. After accumulating several 
blocks of output sequences, the encoder transmits all the 
known bits in the set A2 for all the first-phase transmissions. 
The encoder and decoder can use any reliable point-to-point 
polar code with non-vanishing rate for communication. Having 
received the "genie-aided" bits in the second-phase, the second 
receiver then decodes all the first-phase blocks. The number 
of blocks sent in the first-phase is 0{^). The rate penalty 
is 0(77) where 77 can be made arbitrarily small. A similar 
argument was provided in [24] for designing polar codes for 
the Gelfand-Pinsker problem. 

X. Conclusion 
Coding for broadcast channels is fundamental to our under- 
standing of communication systems. Broadcast codes based 
on polarization methods achieve rates on the capacity bound- 
ary for several classes of DM-BCs. In the case of rTi-user 
deterministic DM-BCs, polarization of random variables from 
the channel output provides the ability to extract uniformly 
random message bits while maintaining broadcast constraints 
at the encoder As referenced in the literature, maintaining 
multi-user constraints for the DM-BC is a difficult task for 
traditional belief propagation algorithms and LDPC codes. 

For two-user noisy DM-BCs, polar codes were designed 
based on Marton's coding strategy and Cover's superposition 
strategy. Constraints on auxiliary and input distributions were 
placed in both cases to ensure alignment of polarization indices 
in the multi-user setting. The asymptotic behavior of the 
average error probability was shown to be Pi"' = 0(2-"^) 
with an encoding and decoding complexity of 0(71 log 71). 
The next step is to supplement the theory with experimental 
evidence of the error-correcting capability of polar codes 
over simulated channels for finite code lengths. The results 
demonstrate that polar codes have a potential for use in several 
network communication scenarios. 
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Appendix A 
Polar Coding Lemmas 

The following lemmas provide a basis for proving polar 
coding theorems. A subset of the lemmas were proven in 
different contexts, e.g., channel vs. source coding, and contain 
citations to references. 

Lemma 10: Consider two random variables X S {0, 1} and 
Y & y with joint distribution Px.Y{x,y). Let Q{x\y) ~ i 
denote a uniform conditional distribution for x G {0, 1} and 
y E y- Then the following identity holds. 



DlPxiYiMy) Qix\y)] = l-H{X\Y) 



(97) 



Proof: The identity follows from standard definitions of 
entropy and KuUback-Leibler distance. 

H{X\Y) 



yey 



i;e{0,l} 



X\Y 



'My) 



y&y 



a;e{0,l} 



QiMy) 



y&y 



i;e{o,i} 



= E^^(y) 



1- J2 Px\Yix\y)\o. 
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xe{o,i} 



Px\Y{x\y) 
Q{x\y) 



= l-D(^Px\Yix\y)\\Qix\y) 



Lemma 11 (Estimating The Bhattacharyya Parameter): 
Let (T,F) ~ PT,v{t,v) where T G {0,1} and F e V 
where V is an arbitrary discrete alphabet. Define a likelihood 
function L{v) and inverse likelihood function L^^{v) as 
follows. 



L{v) 



^ P tiv{0\v) , a Ptw^Iv) 



Pt\v{1\v)' 



Pt\v{0\v) 



To account for degenerate cases in which Prp^Y{t\v) = 0, 
define the following function, 

'/ l[P^i,,(t|.)=o] 



L-\v) if 1| 



P-r\v(t\v)>a\ ™d l[t=0] 



In order to estimate G [0, 1], it is convenient to sample 

from PTv{t,v) and express Z{T\V) as an expectation over 
random variables T and F, 



(99) 



Proof: The following forms of the Bhattacharyya param- 
eter are equivalent. 



Z{T\V) ^ 2 J2 Pv{v)^PT\viO\v)PT\v{Mv) 
= 2J2VPtv{0,v)Ptv{1,v) 



vev 



E^^(«) E ^JPT\v{t\v){l-PT\v{t\v)) 

vev te{o,i} 



te{o,i} i':PT|v(t|i')>o 
vev 



'l-PT\v{t\v) 
PT\v{t\v) 



Lemma 12 (Stochastic Degradation (cf i l23l/ ))." Consider 
discrete random variables V, Yi, and Y2. Assume that |V| = 2 
and that discrete alphabets J^i and 3^2 have an arbitrary size. 
Then 

PY,\v{yi\v) >- PY,\v{y2\v) ^ Z{V\Y2) > Z{V\Yi). (100) 

Proof: Beginning with the definition of the Bhattacharyya 
parameter leads to the following derivation: 

Z{V\Y2) 

^2Y,VPvYMy2)PvY,a,y2) 



= 2Y,^Pv{^)Pv{\)^PYMyMPY.\v{y2\\) 

= 2^Pv{0)Pv{l)Y^ 



y2 



IJ2Pn\viyi\o)PY,\YAy2\yi) 



yi 



lY.PY,\v{yi\l)PY,\YM. 



'2\yi] 



yi 



Then applying the Cauchy-Schwarz inequality yields 

Z{V\Y2) 

>2^Pvio)Pv{i)J2 Y.\l PyMyA^)PY.\YAy2\yi) 



Y.\l PyMy^^)PY.\YAy2\yi] 



= 2^PviO)Pvil)J2 



y2 



E^^2|n(y2|2/i) 



yi 



Pn\v{yi\o)Pn\v{yi\i) 
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Interchanging the order of summations yields 

Z{V\Y2) 

> 2y/Pv{0)Pv{l) 



Y,\/Pn\viyi\o)Pn\viyi\i) 

yi 

J2 ^Y^\YAy2\yi) 



V2 



Z{V\Y{ 



Lemma 13 (Successive Stochastic Degradation (cf. /ZJ'/jj; 
Consider a binary random variable V, and discrete random 
variables Yi with alphabet J^i, and Y2 with alphabet 
3^2- Assume that the joint distribution PvYiY2 obeys the 
constraint Py^iviyilv) >~ PY2\v{y2\v)- Consider two i.i.d. 
random copies {V'^,Yi,Y2) and {V'^,Y^,Y2) distributed 
according to Pyyj^y^. Define two binary random variables 
^V^® and ^V^. Then the following holds 

Z {U^\Y^'-^) > Z {U^\Y^'-'^) , (101) 
Z {U^\U\Yi'-'^) > Z {U^\U^,Y^^-^) . (102) 

Proof: Given the assumptions, the following stochastic 
degradation conditions hold: 

^Y/|vi(2/i>') >- Pviw^iyllv'), (103) 
PY2\v^iy!\v^) >~ Py,^|y2(y2>2). (104) 

The goal is to derive new stochastic degradation conditions 
for the polarized conditional distributions. The binary ran- 
dom variables and C/^ are not necessarily independent 
Bemoulli(i) variables. Taking this into account, 

PYiYi\u^{yl,yl\u^) 
1 



1 



Pu^{u^) 



E 

u2g{0,1} 



Pyiyi{u^ (Bu'.yDPv^YAu^.yl) 



■ PYi\v-{vlW)Pv-{u^) 
Applying the property due to the assumption in (1103 

1 



^ Pvi{u^ ®u'^)Pv^{u^) 
«2e{o,i} - 

E PY^\v^{a\u^ ®u^)PY^\Yl{y\\a) 



E 



I y 2 (& I ) Py^2 | y 2 (y| | 6) 



Interchanging the order of summations and grouping the terms 
representing Py^iy2|[;i yields the following 

= E PYiY:^\u^{a,b\u^)PY^\Yi{y\\a)PY^\Y^{% 
aeyubeyi 

The above derivation proves that 



1 -21^1^ 



PYiY,^\u^{yl,yi\u^) >- PviYiiu^iyl^yl 

Combined with Lemma [12] this concludes the proof for the 
ordering of the Bhattacharyya parameters given in (1101) . 
In a similar way, it is possible to show that 

Py^^ y^^ y 1 1 c/2 (y2 , y2 - I ) 



1 



Pu^iu^y"'^- 
1 



Pylyi (U^ © U^, y\)Pv^Yi vl) 



PYi\v^{yl\u^ ®u^)Pv^{u^ ©-U 

■ PY^2\vMW)Pv<n') 
Applying the property due to the assumption in (|104t 

Py^ Y^m\ m {vl ,yl,u^\u'^) 

Pu2[u^) [ 

• E PyI\V^{"'\u^ ®u'^)PYi\Yl{y2\"^) 



beyi 

Interchanging the order of the terms and grouping the terms 
representing Py^iy2[;i|[;2 (y^f, y^, yields the following 



PY^Yiu^iu^iyliyl^u'^l'U''^) 

PY^Y^m\m {a,b,u^\u^) 



E 

a(iyi,b(^yi 



Py^ I Y^ {y\ I a) I y 2 {yl\b) 



E 

aeyi.beyi,ce{o,i} 



Py^yIu^\u'^ (a, 6, c|u^) 



Py^ I y; {y\ I a) Py2 I y 2 (^2 1 ^) 1 [«i =C 

The above derivation proves that 

PYiY?u^\u^{y\,yl,'^^W) > PYiYiu^\u^{yl,yl,'^^W)- 

Combined with Lemma [12] this concludes the proof for the 
ordering of the Bhattacharyya parameters given in (1102) . ■ 
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Lemma 14 (Pinsker's Inequality): Consider two discrete 
probability measures P{y) and Q{y) for y G The following 
inequality holds for a constant k = 2 In 2. 



E - ^ (P(y)||Q(2/)). 



Lemma 15 (Ankan l^): Consider two discrete random 
variables X G {0,1} and Y ^ y. The Bhattacharyya 
parameter and conditional entropy are related as follows. 



Z{X\Y)^ < H{X\Y) 
H{X\Y)<\og^{l + Z{X\Y)) 



Lemma 16 (Bhattacharyya vs. Entropy Parameters): 
Consider two discrete random variables X G {0,1} and 
Y (^y. For any < (5 < i, 



Z{X\Y) >l-5 
Z{X\Y) < 5 



H{X\Y) > 1 - 2<5. 
H{X\Y)<\og^{l + 5). 



Proof: Due to Lemma[T5l H{X\Y) > Z{X\Y)^ > (1 - 
(5)2 > 1-2S + S^ > 1-26. It follows thatif > 1-6 

and 6^0, then H{X\Y) — > 1 as well. Similarly, due to 
Lemma [Tsl taking constant k — 2 using the series 
expansion of logg(l + 5), if Z{X\y) < 6 then H{X\Y) < 
log2(l + 6) = K ^ follows that if 

Z{X\Y) <6 and 6^0, then H{X\Y) ^ as well. ■ 



Appendix B 
Proof Of Lemma[T] 



The total variation bound of Lemma [U is decomposed in a 
simple way due to the chain rule for Kullback-Leibler distance 
between discrete probability measures. The joint probability 
measures P and Q were defined in ([8]l and ( fT9] l respec- 
tively. According to definition, if ^'({Mi "}ig[m])> then 
Q{{ul'^}ie[m])> 0. Therefore the Kullback-Leibler distance 



D{P\\Q) is well-defined and upper bounded as follows. 

^(^({«r'}.e[m])||Q({«r'}.6[H)) 



EE 

i=i j=i 



(«,0-)|^f'~\{<"W[i.-i])) 



(105) 



E E 



D[p[u,ij)\u]-^-\{ul-} 



fce[l:i-l' 



Q (utij) u]'-^ \ {"fc "}fce[i:<-i]) ) 

m 

= E E l-^(t/.(j)|f/^'"\{f/n.e[l:.^l] 
m 



< 



J226n 



M 



(n) 



(106) 

) 

(107) 

) 

(108) 
(109) 



The equality in dlOSI l is due to the chain rule for Kullback- 
Leibler distance. The equality in ( |106t is valid because for 
indices J i Mt\ P {^^^"}fce[i.-i]) - 

Q ("j(j)|"J'''"\{wfc"}fcG[i:«-i])- The equality 

in (I107l i is valid due to Lemma [TO] and the fact that 

Q ("i(j)|"l'''"\{'«fc"}fce[i:i-i]) = \ for indices 

i G A^i"''. The equaUty in ( fTOSl l follows due to 
the one-to-one correspondence between variables 
{f/fc^"}fce[i:»-i] and {Yl--^)ke{i:,-i\- The last inequality ([109} 
follows from Lemma [16] due to the fact that 

Z{u,{3)\ul'■''\{yl'■'''}ke{l■.^-l\) > 1 - Sn for indices 

To finish the proof of Lemma [T] 



< 



(P {[ul-^jkeim]) \\Q {H-''}kel,n]) 



< 



M 



(«) 



(110) 
(111) 



< y^(2K)(TO-n)(2-"^'). 
The inequality in ( 11 lOt is due to Pinsker's inequality given 
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in Lemma [141 The inequality in ( 111 II ) was proven in (|1091 l. 

Finally for /3' e i), y^(2K)(TO • n)(2-"^' ) < 2-"" for 
sufficiently large n. Hence the total variation distance is 
bounded by 0(2-"") for any < /3 < i. 

Appendix C 
Superposition Coding 

The total variation bound of Lemma |5] is decomposed in 
a simple way due to the chain rule for Kullback-Leibler 
distance between discrete probabiUty measures. The joint 
probability measures P and Q were defined in ( |40| | and (l60l l 
respectively. According to definition, if (u", ) > 

then Q(ui,U2)> 0. Therefore the Kullback-Leibler distance 
D{P\\Q) is well-defined. Applying the chain rule. 



written as 



n 

= J2 d{p{u,{j) 
E D[p[u2ij) 



— 1 n 
2 J ^1 



l:j-l 



— 1 n 
^2 J ^1 



Q [Mj) 
Q (Mj) 



*2 1 ^1 



Applying Lemma [TOl the one-to-one relation between f/f and 
V", and Lemma [T6] leads to the following result. 



E 



1~H 



{U2U) 



E 



U/2(j) 



?7. 



< 2bn 



in) 



M 



(n) 



Using identical arguments as applied in the proof of Lemma [T] 
the total variation distance between P and Q is bounded as 

0(2-""). 

To prove Lemma |6] the expectation of the average proba- 
bility of error of the polarization-based superposition code is 



' 1 '2 ^2 



E 



2nR2 



n pH^'^(-r\-2G„)=.iw} 

je[«]:j^A4<"' 

From the definitions of the random boolean functions 
in (|52l) and in (|53]l, it follows that 

= P{c/2(j)=«2(j)|C/2'^^"'-4^'"'}. 

The expression for the expected average probability of error 
is then simplified by substituting the definition for Q{u", U2) 
provided in ( |60] l as follows. 



E 



E 

The next step in the proof is to split the error term 
^/M/O) (J) \ -fe"^ , '^2 ^ }] into two main parts, one part 
due to the error caused by polar decoding functions, and 
the other part due to the total variation distance between 
probability measures. 



^^„y J (2/r, 2/2K, «2 )QK, «2 ) 

-r J -f 2 '-'1 "2 



E 



E 



P 



(2/r,2/2"<,«2) 



< 



E Pc/rc/?y,"y,"K,«2,yi,y2") 



E \Pu^u5{u1,u-)-Q{ul,u^) 

.ii5'e{o,i}" 
uje{o,i}" 



(112) 



Lemma |5] established that the error term due to the total 
variation distance is upper bounded as 0(2~" ). Therefore, 
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it remains to upper bound the error term due to the polar upper bounded as 
decoding functions. Towards this end, note first that T = 

ri.uTiura, ri„ = u^-t;^, for j e Mi"^ c m[i\ Ti - u^-t?' 

for j G and 72 = U^Tj for j e M^2^\ It is convenient 

to bound each type of error bit by bit successively at both 
decoder Di and ^2 as follows. 



cj A 



J2 Puru^YrY,"[u-{,u-i,y{,y2, 



^ z{ui\ui--^-\v\Yr) 



E ^(^2|>^2") 



(") 



< 3nJ, 



7W 



(n) 



This concludes the proof demonstrating that the expected 
average probability of error is upper bounded as 0(2"" ). 



In this form, it is possible to upper bound the error term Ef^ 
with the corresponding Bhattacharyya parameter as follows, 



E P{u,'^\yi)P{ui\u',=~\y'^) 



"2 ■ yi 

l:j-l 



< 



< J2 P{u,'-\yi)P{ui\u',^'\ 



Z{Ui\u'^-=-\Y-). 



Using identical arguments, the following upper bounds apply 
for the individual bit-by-bit error terms caused by successive 
decoding at both Vi and 2?2- 



£l<Z{Ui\U',-'-\Y-), 
E{ < Z{u(\ul-^-\V",Y{'), 

ei<z{ui\Y^). 



(113) 
(114) 
(115) 



Therefore, the total error due to decoding at the receivers is 



Appendix D 
Marton Coding 

To prove Lemma |9] the expectation of the average proba- 
bility of error of the polarization-based Marton code is written 

as 



E 

n ip{r(j) = ^i2(j)} 



The expression is then simplified by substituting the definition 
of Q{ui, U2) provided in ( |92] l. and then splitting the error term 
into two parts: 

IE{*»),*(^),r}[^i"^[{*i'^*2'^r}] = 



E 



Y^Y^\U]^U^ 



< 



E Pu^u^Y,-YA<,^2,yi,y2: 



E |Pc/rt/?K'0-QK'"2) 

■«re{o,i}" 
«je{o,i}" 
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The error term pertaining to the total variation distance was 
already upper bounded as in Lemma |8] The error due to 
successive cancelation decoding at the receivers is upper 
bounded as follows. 

< J2 z{ui\ui--^-\Yr)+ Z{Ui\u','-\Y,-), 



jeMY' 



< 2n5. 



in) 



c 



(n) 

V2\Y2 



This concludes the proof demonstrating that the expectation 
of the average probability of block error is upper bounded as 

0(2-""). 

Appendix E 
Proof Of Lemma[3] 

The implication in ( [30l ) follows since X — Yi —Y2 means 
that PY^\xiy2\x) = J2y, PY,\xiyi\x)PY2\YAy2\yi)- The im- 
plication in dSTl ) follows by observing that 

PY2\v{y2\v) 

= PY,Y2\v{yi,y2\v) 
vi&yi 

= Y Px\vix\v)PY^Y2\x{yi,y2\x) 
= J2^^\v(^\'") PYiY2\xiyi,y2\x) 

= YPx\vmPy^\xiy2\x) 

= Y^^\v(^\'") Y PYl\xiyl\x)PY2\Y^{y2\yl) 
xex yi&yi 

(116) 

= Y Y^^\y^^\^^^Y^\xiyl\x)PY2\YAy2\yl) 

yi&yi x£X 
= Y PYl\viylW)PY2\Y^{y2\yl)■ 

yieyi 

In step dl I6I 1. the assumed stochastic degraded condition 

PYi\x{yi\x) >- PY2\x{y2\x) ensures the existence of the 
distribution PY^\Yi{y2\yi)- The converse to dsTT i follows since 
it is possible to select Px\v{x\v) ~ l[x=v] where the alphabet 
V = X. In this case, for any v € X, 



PY2\v{y2\v) Y Px\v{x\v)PY2\x{y2\x) 

xex 

= Y M^=v]PY2\x{y2\x) 



xex 

PY2\x{y2\v). 



Similarly, PYi\v{yi\v) = PYi\x{yi\v) for any v e X. Due to 
the assumed stochastic degradedness condition PY2\v{y2\v) = 
J2y, PY,\viyi\v)PY2\YAy2\yi)^ for any v e X, 



P. 



Y2\X 



{y2\v) = PY2\viy2\v) 



E 

yi 

E 

3/1 



PYi\viyi\v)PY2\YAy2\yi) 
PYi\x{yi\v)PY2\YAy2\yi)- 



Therefore the stochastic degradedness property Py^^xiyilx) >- 
PY^\x{y2\x) must hold as well. The statement of ( [3T| i means 
that Class / and Class // are equivalent as shown in Fig- 
ure |3] The implication in (l32T i follows because assuming the 
stochastic degradedness property Py^^yd/ilv) >- PY2\v{y2\v) 
holds for all Px\v{x\v), there exists a Yi such that V — Y1 — Y2 
form a Markov chain and Py-^^yiyilv) — PYi\v{yi\v) for 
all Px\v{x\v)- By the data processing inequality, I{V;Yi) > 
I{V;Y2). If Py^^y{yi\v) = Py^ivimlv), then Pyy^iv,n) = 
PvYi{v,yi) for all Pv{v). It follows that for all Pvx{v,x), 
the mutual information I{V;Yi) — I{V;Yi). The implication 
in (|33] | follows by setting Pvxiv,x) — l[y^j.]Px{x) and 
letting V = X. Then for any v <E X, 



PvYiiv,yi) 



Y Pvx{v,x)PY^\x{yi\x) 

xex 



E 



l[v=x]Pxix)Py^\xiyi\x) 



xex 

= Px{v)Py,\x{yi\v) 

= PxYt{v,yi). 

Similarly for any v E X, -Pyy-a (^7 2/2) = PxY2{'v,y2)- There- 
fore for the particular choice of Pvx{v,x) = l[y^x]Px{x), 
I{V; Yi) = I{X; Yi) and I{V; Y2) = /(X; Y2). The converse 
statements for (|30] |. ( |32] |. and (l33T l do not hold due to a 
counterexample involving a DM-BC comprised of a binary 
erasure channel BEC(e) and a binary symmetric channel 
BSC(p) as described in Example [3] 
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